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ABSTRACT 


This  paper  describes  hoty  we  parallelized  a  benchmark  problem  for  parallel  simulation, 
the  Sharks  World.  The  solution  we  describe  is  conservative,  in  the  sense  that  no  state 
information  is  saved,  and  no  rollbacks^  occur.  Our  approach  illustrates  both  the  principal 
advantage  and  principal  disadvantage  of  conservative  parallel  simulation.  The  advantage 
is  that  by  exploiting  lookahead  we  find  an  approach  that  dramatically  improves  the  serial 
execution  time,  and  also  achieves  excellent  speedups.  The  disadvantage  is  that  if  the  model 
rules  are  changed  in  such  a  way  that  the  lookahead  is  destroyed,  it  is  difficult  to  modify  the 
solution  to  accommodate  the  changes.  /  /  ,  ,  --  t  / 
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1  Introduction 


The  Sharks  World  simulation  was  proposed  in  early  1990  as  a  testbed  problem  for  studying 
issues  in  parallel  simulation  [1].  Following  that  proposal,  we  were  invited  to  participate  in 
a  1990  Winter  Simulation  Conference  session  devoted  to  different  methods  for  attacking  the 
Sharks  World  problem.  We  were  asked  to  write  a  paper  that  emphasizes  the  process  by 
which  the  problem  was  parallelized  using  some  sort  of  conservative  synchronization.  Our 
background  in  parallel  simulation  has  largely  been  in  showing  how  to  extract  lookahead 
(the  ability  of  a  simulation  model  element  to  predict  its  future  behavior)  which  can  then  be 
exploited  by  any  conservative  method.  Indeed,  our  thesis  has  long  been  that  conservative 
synchronization  protocols  ought  to  be  tailored  to  the  specifics  of  the  problem  [5]. 

The  Sharks  World  is  a  conceptually  simple  simulation  designed  to  capture  many  of  the 
salient  features  of  more  complex  physical  models,  such  as  the  colliding  hockey  pucks  problem 
[2].  The  Sharks  World  has  a  torodal  topology,  and  is  populated  with  two  species:  sharks, 
and  fish.  A  creature  moves  at  a  fixed  velocity,  and  a  fixed  direction;  velocity  and  direction 
may  vary  from  creature  to  creature.  A  shark  will  eat  any  fish  that  strays  within  a  distance 
A  of  the  shark.  The  fish  disappears  from  the  simulation,  but  the  shark’s  course  remains 
unaltered. 

This  problem’s  principle  difficulty  lies  in  the  complexity  of  determining  potential  inter¬ 
actions.  When  a  fish  and  shark  are  relatively  close  in  the  domain  one  may  easily  enough 
determine  if  and  when  the  shark  could  eat  the  fish.  However,  there  is  no  guarantee  that  the 
fish  will  make  the  rendezvous,  as  it  may  be  consumed  by  a  different  shark  at  an  earlier  time. 
As  we  will  see,  the  solution  proposed  in  [1]  involves  a  certain  amount  of  event  cancelling  to 
retract  falsely  anticipated  interactions. 

Lookahead  is  absolutely  essential  to  achieve  good  performance  using  any  conservative 
synchronization  method.  Our  past  methods  for  lookahead  computation  relied  on  techniques 
such  as  the  pre-sampling  of  random  variables  [3],  and  exploitation  of  non-preemptive  queue¬ 
ing  disciplines  [4],  Identification  of  lookahead  tends  to  be  problem-class  specific.  When  we 
accepted  the  challenge  to  parallelize  the  Sharks  World,  we  accepted  the  responsibility  to  find 
lookahead  in  a  type  of  problem  we  had  not  yet  considered.  Indeed,  finding  that  lookahead 
proved  to  be  the  most  important  aspect  of  our  solution  approach. 

This  paper  chronicles  our  efforts.  We  began  by  developing  a  baseline  serial  simulation 
along  the  lines  suggested  in  [1].  The  purpose  of  this  simulation  was  to  develop  a  better 
understanding  of  the  problem,  and  to  provide  a  benchmark  for  the  eventual  parallel  simula¬ 
tion.  In  our  implementation  all  distance  and  time  quantities  are  taken  to  be  real  numbers. 
This  is  a  minor  deviation  from  the  simulation  described  in  [1]  where  distance  and  time  are 
discretized.  A  discretized  approach  is  at  variance  with  inherently  real  quantities  involved 
in  movement  calculations — sines  and  cosines  for  example.  Next  we  pondered  the  simulation 
problem,  looking  for  exploitable  lookahead.  Once  the  lookahead  was  identified  we  wrote 
a  new  serial  simulation  which  emulates  the  eventual  parallel  simulation.  The  advantage 
of  this  intermediate  step  is  that  workstations  provide  a  far  better  development  and  debug¬ 
ging  environment  than  does  almost  any  parallel  system.  The  new  serial  simulation  employs  a 
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different  computational  paradigm  than  the  original  Sharks  World  simulation,  and  on  a  work¬ 
station  implementation  runs  over  twenty  times  faster  than  the  baseline  simulation.  Having 
thus  validated  the  lookahead  ideas  we  parallelized  the  new  serial  code.  The  parallelization 
was  straightforward — it  required  only  two  hours  to  parallelize,  debug,  and  validate  the  first 
parallel  version. 

This  paper  is  organized  as  follows.  §2  outlines  the  original  sectoring  paradigm  proposed 
in  [1],  and  the  different  approach  we  adopt.  §3  describes  our  method  in  more  detail,  and 
explains  its  parallelization.  §4  addresses  performance,  and  §5  presents  our  conclusions. 


2  Overview  of  Solution  Methods 

Our  approach  to  the  problem  is  different  than  the  one  outlined  in  [1].  As  a  point  of  compar¬ 
ison  we  briefly  outline  the  original  simulation  strategy,  and  then  our  own. 

2.1  Original  Method 

The  Sharks  World  is  partitioned  into  sectors.  There  are  two  types  of  simulation  events: 
Change_Sector,  and  Attack_Fish.  The  former  occurs  when  a  fish  or  shark  passes  from  one 
sector  to  another.  The  latter  occurs  when  a  shark  attacks  a  fish.  A  rough  sketch  of  the 
basic  event  processing  follows.  In  the  interests  of  readability,  a  number  of  details  have  been 
suppressed. 

Change_Sector  Suppose  a  creature  is  entering  sector  c.  Determine  the  identity  c  of  the  next 
sector  the  creature  will  enter  if  it  manages  to  pass  through  c  unharmed,  and  determine 
the  time  tc  at  which  it  would  leave  c.  Schedule  another  Change_Sector  event  for  the 
creature,  at  time  tc.  Finally,  call  a  routine  NewAttackTimesO.  If  the  entering  creature 
is  a  fish,  this  routine  computes  the  minimal  next-attack-time  (if  any)  from  among  all 
sharks  presently  able  to  attack  sector  c.  If  the  entering  creature  is  a  shark  the  routine 
computes  its  next  attack  time  on  every  fish  currently  in  sector  c,  possibly  re-scheduling 
an  Attack_Fish  event  as  a  result. 

Attack_Fish  Cancel  the  event  where  the  fish  leaves  the  sector.  Remove  the  fish  from  the 
simulation.  Call  a  routine  NextKillTimeO  to  reschedule  the  time  of  the  next  shark 
attack  in  the  sector. 

The  basic  idea  behind  sectoring  is  to  limit  the  number  of  shark-fish  interactions  that 
have  to  be  considered  in  NextKillTimeO.  One  chooses  (square)  sectors  that  are  at  least 
as  large  in  both  dimensions  as  the  distance  A  at  which  a  shark  may  attack  a  fish.  Then  at 
any  given  simulation  time  t,  the  set  of  sharks  that  are  able  to  attack  a  given  fish  must  reside 
within  one  sector’s  distance  of  the  fish.  When  computing  the  time  of  the  next  attack  in  the 
sector  one  need  consider  only  the  sharks  that  are  close  enough  to  the  sector.  Alternately, 
one  permits  smaller  sectors  but  extends  the  search  for  sharks  to  any  sector  within  distance 
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Computation  of  the  next  attack  in  a  sector  c  has  time  complexity  0(FcSc),  where  Fc  is 
the  number  of  fish  presently  in  the  sector  and  Sc  is  the  number  of  sharks  that  can  attack 
fish  in  c.  Therefore,  as  the  sector  size  decreases  the  complexity  of  each  NextKillTimeO  call 
decreases.  However,  because  there  are  more  sectors  the  total  number  of  such  calls  increases, 
and  the  number  of  Change JSector  events  also  increases.  One  must  empirically  determine  the 
sector  size  that  optimally  manages  this  tradeoff.  A  complexity  analysis  given  in  §4  qualifies 
this  tradeoff. 

2.2  Starting  Over  From  Scratch 

A  conservative  solution  method  must  find  and  exploit  lookahead.  The  basic  problem  with 
the  Sharks  World  simulation  is  that  after  we  schedule  a  Change_Sector  event  for  a  fish,  the 
fish  may  later  be  consumed  by  a  fast-moving  shark  whose  future  presence  was  unknown  at 
the  time  we  scheduled  the  Change_Sector  event  for  the  fish.  Where  then  is  the  lookahead? 

After  much  deliberation  (and  a  few  false  starts),  we  noticed  the  most  obvious  of  lookahead 
properties:  a  shark’s  position  at  any  future  time  t  can  be  exactly  predicted.  For  that  matter, 
one  can  predict  the  future  position  of  any  fish  at  time  i,  provided  that  it  is  alive  at  time  t. 
Our  first  thought  was  to  use  the  basic  sectoring  approach,  but  then  continuously  “project” 
shark  positions  far  enough  into  the  future  so  that  whenever  a  fish  enters  a  sector,  all  sharks 
that  could  possibly  attack  it  during  its  duration  in  that  sector  are  already  known.  We  can 
then  accurately  compute  whether  the  fish  manages  to  escape  the  sector,  or  is  eaten  (and  by 
whom).  If  we  determine  that  it  escapes  we  can  confidently  report  its  departure  to  the  next 
sector  in  its  path.  Indeed,  this  is  a  viable  conservative  approach  to  the  problem.  However, 
there  is  a  simpler  and  faster  method. 

Given  the  specifications  for  a  simulation,  one  typically  attempts  to  determine  the  most 
efficient  way  to  implement  the  simulation.  When  implementing  conservative  parallel  simu¬ 
lation  one  has  to  trust  that  the  problem  specifics  will  not  change,  for  within  the  problem 
specifics  one  finds  the  needed  lookahead.  In  a  commercial  setting  there  is  a  very  real  danger 
that  mid-way  through  development  a  customer  will  change  the  problem  specifics.  This  can 
spell  disaster  for  a  conservative  approach,  for  the  changes  may  destroy  the  lookahead  a  round 
which  the  simulation  is  designed.  The  Sharks  World  simulation  is  an  excellent  example  of 
this  phenomenon. 

The  object  of  the  Sharks  World  simulation  is  to  determine  the  time,  positior ,  and  cause  of 
each  fishes’  demise.  Now  the  trajectories  of  the  sharks  and  fishes  are  comple.ely  determined 
by  their  initial  positions,  directions,  and  velocities.  In  theory  we  can  compute  the  intersection 
of  a  fishes'  trajectory  with  a  shark’s  trajectory.  By  considering  all  the  sharks,  we  can 
determine  the  earliest  time  at  which  a  shark  attacks  the  fish.  The  only  problem  is  computing 
the  trajectory  intersections.  The  section  to  follow  will  show  how  this  can  be  efficiently  done. 

The  “back-to-basics”  approach  has  many  advantages.  We  will  see  that  it  runs  over  twenty 
times  faster  on  a  Sun  Sparc  1+  workstation  than  does  the  sectoring  simulation.  We  will  also 
see  that  parallelization  is  trivial,  and  that  excellent  speeiups  are  achieved.  It  is  hard  to 
dismiss  these  advantages.  But  consider  any  minor  modification  to  the  rules  that  permit  a 


□  □ 


creature’s  trajectory  to  change:  as  a  consequence  the  lookahead  properties  are  changed,  and 
the  entire  approach  has  to  be  reworked.  Herein  lies  the  dual  nature  of  conservative  parallel 
simulation. 


3  The  Time-sliced  Intersection  Projection  Algorithm 

The  Sharks  World  problem  asks  that  we  determine  which  fish  are  consumed  within  a  time 
interval  [0,T],  the  time,  location,  and  cause  of  their  consumption.  If  we  can  efficiently 
determine  the  earliest  attack  time  between  every  fish  and  shark,  the  most  straightforward 
way  to  solve  this  problem  is  to  compute  the  minimum  attack  time  (if  any)  on  every  fish. 
We  call  this  intersection  projection,  owing  to  its  implicit  projection  of  creature  positions 
far  into  the  future.  We  will  actually  employ  intersection  projection  over  different  time- 
slices  of  the  simulation,  yielding  the  name  Time-sliced  Intersection  Projection,  or  simply 
TIP.  This  section  describes  TIP,  its  underlying  method  for  projecting  intersections,  and  its 
parallelization. 


3.1  Projections  and  Time-Slices 

The  intersection  projection  algorithm  can  be  thought  of  as  a  doubly  nested  loop.  Certain 
efficiencies  are  achieved  if  the  inner  loop  runs  over  sharks,  while  the  outer  loop  runs  over  fish. 
For,  within  the  inner  loop,  we  may  maintain  the  least  kill  time  tkm  known  so  far  for  the  fish 
fixed  as  the  outer  loop  variable.  Each  successive  inner  loop  iteration  (i.e.,  for  each  successive 
shark)  we  need  only  look  for  interactions  with  the  fish  within  the  interval  [0,  <*»/«] — any  later 
interaction  will  not  occur — thereby  reducing  the  workload  somewhat.  The  order  in  which 
we  compare  sharks  with  a  given  fish  has  a  great  deal  to  do  with  the  savings  we  achieve. 
Consider  a  fish  that  is  eaten  by  some  shark  So  early  in  the  interval  and  would  interact  (if  it 
had  lived)  with  another  shark  Si  late  in  the  interval.  If  we  compute  the  interaction  with  Si 
first  we  project  both  the  shark  and  fish  through  most  of  [0,  T ]  before  finding  the  interaction. 
If  instead  we  had  computed  the  interaction  with  So  first,  we  would  have  been  able  to  cut 
the  projection  with  Si  well  short  of  T. 

One  way  to  avoid  unnecessary  projection  is  to  use  time-slices.  Divide  [0,  y]  into  subin¬ 
tervals  of  width  At.  We  start  by  computing  all  interactions  between  sharks  and  fish  over 
[0,  At],  Any  fish  that  is  consumed  in  this  interval  is  removed  from  the  fish  list.  The  positions 
of  all  remaining  creatures  are  then  projected  forward  to  time  At,  and  we  repeat  the  process 
over  subinterval  [At, 2 At].  We  call  this  Time-sliced  Intersection  Projection,  or  TIP.  TIP  has 
the  advantage  of  limiting  unnecessarily  long  projections,  and  of  reducing  the  number  of  fish 
involved  at  each  subinterval.  It  does  suffer  the  additional  cost  of  “moving”  each  creature 
at  the  end  of  a  subinterval,  and  creates  the  problem  of  deciding  how  large  At  ought  to  be. 
Informal  experimentation  with  our  code  showed  that  approximately  a  factor  of  two  gain  in 
performance  over  no  time-slicing  was  achieved  using  At  =  T j  10.  This  rule  was  employed  in 
the  experiments  reported  in  §4. 
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3.2  Intersections  in  a  Torodal  World 


We  wish  to  determine  when  a  given  fish  and  a  given  shark  are  close  enough  for  the  shark  to 
consume  the  fish.  The  problem  is  complicated  by  the  fact  that  both  the  fish  and  shark  may 
complete  many  circuits  of  the  Sharks  World  before  meeting.  The  solution  we  present  here 
efficiently  deals  with  this  problem. 

Let  (x/(f),  t//(f))  be  the  position  of  the  fish  at  time  t ,  0/  be  its  angle  of  direction  and 
let  Vf  be  its  velocity.  Similarly  define  (xs(t),ys[t))}  9,,  and  v,  for  the  shark.  If  the  fish  and 
shark  are  to  be  within  distance  A ,  they  must  be  within  distance  A  in  each  coordinate.  Our 
approach  is  to  determine  the  functional  form  of  all  epochs  when  the  fish  and  shark  coincide 
in  x,  and  the  functional  form  of  epochs  when  they  coincide  in  y.  Around  each  epoch  there 
is  a  window  within  which  the  fish  and  shark  coordinates  differ  by  no  more  than  A.  We  look 
for  the  intersection  of  windows  around  x  epochs  and  windows  around  y  epochs. 

For  the  purposes  of  description,  view  the  behavior  of  creatures’  x-coordinates,  (x/(f)  and 
x,(f)),  as  particles  on  a  ring  of  length  M.  Xf(t )  moves  with  velocity  u/  cos0/,  and  x,(t)  moves 
with  velocity  u,  cos  9S\  the  sign  of  a  velocity  indicates  the  particle’s  direction  (clockwise  or 
counter-clockwise).  Without  loss  of  generality  assume  that  the  magnitude  of  X/(f)’ s  velocity 
is  larger  than  the  magnitude  of  x,(t)’s  velocity.  If  the  two  particles  are  moving  in  the 
same  direction  x/(f)  overtakes  i,(i)  at  relative  velocity  vXT  =  |u/cos0/  —  v,  cos0,|;  in  other 
words,  after  their  first  meeting  xf(t)  and  x,(t)  coincide  every  Px  =  M/vxr  units  of  time. 
If  the  particles  move  in  opposite  directions  they  approach  each  other  at  relative  velocity 
v„  -  lu,  cos  0/|  +  \vs  cos  9 a | ,  and  meet  every  Px  =  M/vxr  units  of  time.  The  time  lapse  Tx 
until  their  first  meeting  is  easily  determined  from  the  particles’  initial  positions.  Thus,  the 
particles  exactly  coincide  at  all  epochs 


tk  =  Tx  +  kPx  for  A:  =  0,1,2 . 

It  takes  time  Ix  =  A/vxr  for  the  two  particles  to  close  from  a  distance  A  apart.  For  every 
epoch  tk  the  two  particles  are  within  distance  A  during  [tk-Ix,  tk  +  Ix],  Exactly  the  same  sort 
of  analysis  applied  to  the  Y  coordinate  yields  the  relative  velocity  Vyr,  an  initial  intercept  time 
Ty ,  intercept  periodicity  Pv,  and  window  parameter  Iv.  Figure  1  illustrates  these  definitions. 

A  necessary  condition  for  a  shark  and  fish  to  be  within  distance  A  at  time  t  is  that  t  lie 
in  some  window  around  an  X-coordinate  epoch,  and  in  some  window  around  a  V-coordinate 
epoch.  Let  ex  and  ev  be  the  respective  x  and  y  epochs,  and  let  [31,32]  be  the  intersection  of 
the  windows  around  ex  and  ev.  At  any  time  3  6  [51,32]  the  squared  distance  between  the 
two  creatures  is 

L?(s)  (uxrs  Vxr^x')  -f-  {VyyS  U^Cy)  . 

The  time  of  interest  is  found  by  solving  for  s  satisfying  D{s)2  =  A2,  choosing  the  least  real 
solution.  If  no  real  solution  exists  the  creatures  do  not  come  within  distance  A  during  time 

The  algorithm  for  determining  the  earliest  time  at  which  a  shark  attacks  a  given  fish 
is  straightforward.  First  one  checks  to  see  if  the  shark  and  fish  are  initially  placed  within 
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Figure  1:  Time  line  of  coordinate  projections 


distance  A.  If  so  the  attack  occurs  immediately.  Otherwise  we  initialize  ex  =  Tx  and  e„  =  Ty. 
Proceedingly  iteratively,  we  check  to  see  if  [ex  —  /x,  ex  +  Ix]  fl  [ev  —  Iy,  ey  +  Iv\  ^  0.  If  the 
intersection  is  nonempiy  we  test  for  an  attack;  if  an  attack  is  discovered  we  are  finished.  If 
the  windows  do  not  intersect  or  intersecting  windows  fail  to  produce  an  attack,  we  either  add 
Px  to  ex  or  add  Pv  to  ex,  depending  on  whether  ex  <  ey  or  ex  >  ev.  The  process  repeats  until 
either  an  attack  is  discovered,  or  the  epoch  values  are  larger  than  the  simulation  termination 
time. 

In  the  worst  case  we  will  generate  all  epochs  within  the  simulation  time  span  and  not 
find  an  attack.  Assuming  that  the  maximum  creature  velocity  is  bounded  from  above,  the 
computational  complexity  of  determining  the  first  time  of  an  attack  is  0(T),  where  T  is 
the  length  of  the  simulation  time  span.  Therefore  the  overall  complexity  of  determining  the 
earliest  attack  time  on  all  fish  is  O(FST),  where  F  is  the  number  of  fish  and  5  is  the  number 
of  sharks. 

3.3  Parallelization 

The  TIP  algorithm  is  very  easily  parallelized.  We  simply  partition  the  fish  evenly  among 
processors,  and  ensure  that  within  every  time-slice  a  copy  of  every  shark  visits  every  proces¬ 
sor.  No  communication  of  sharks  is  necessary  when  the  problem  size  is  small  enough  so  that 
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every  processor  may  hold  a  copy  of  every  shark.  When  there  are  so  many  sharks  that  one 
processor  cannot  hold  a  copy  of  each  we  divide  the  sharks  into  “groups” .  A  shark  group  has 
as  many  sharks  as  a  single  processor  can  hold.  Every  processor  is  given  a  copy  of  an  entire 
shark  group.  If  there  are  k  groups  and  P  processors,  processors  0  through  P/k  —  1  get  group 
0,  processors  P/k  through  2 P/k  —  1  get  group  1,  and  so  on.  Each  processor  computes  the 
interactions  of  all  sharks  in  its  current  shark  group  with  all  its  fish.  It  then  sends  the  shark 
group  to  a  processor  that  has  not  yet  seen  a  copy  of  that  group.  This  is  accomplished  by 
having  each  processor  j  send  its  current  group  to  processor  (j  +  P/k )  mod  P. 

Our  implementation  on  the  Intel  iPSC/2  permitted  as  many  as  16,382  total  creatures  to 
reside  on  each  processor  at  a  time.  Models  this  large  are  overwhelmingly  dominated  by  the 
computation  cost — hours  of  execution  time  can  be  expected.  In  the  face  of  this  the  relative 
cost  of  moving  sharks  around  would  be  trival  on  problems  that  require  such  movement. 

4  Performance 

We  consider  the  performance  of  TIP  in  three  ways.  First,  we  use  a  simple  performance  model 
to  show  that  while  TIP’s  computational  complexity  cost  per  simulation  unit  time  on  a  fixed 
domain  has  order  ( FS ),  the  complexity  of  the  sectoring  approach  has  order  (( FS)2/Ns  + 
FS>/Ns),  where  F,S  are  the  numbers  of  fish  and  sharks  and  Ns  is  the  number  of  sectors. 
TIP  therefore  has  an  algorithmic  advantage  over  sectoring.  Secondly,  we  demonstrate  that 
our  approach  works  faster  serially  than  does  the  sectoring  approach.  Finally  we  measure  the 
parallel  performance  achieved  on  a  sixteen  processor  Intel  iPSC/2  where  each  processor  is 
based  on  the  80386/80387  chips,  has  4Mb  of  memory.  We  analyze  performance  as  a  function 
of  problem  size,  measured  by  the  total  number  of  initially  placed  creatures  and  the  length  T 
of  the  simulation  time  interval.  We  find  that  the  number  of  creatures  plays  the  predominant 
role  in  determining  good  performance.  Speedups  in  excess  of  8  are  achieved  when  as  few  as 
64  sharks  and  64  fish  are  simulated;  speedups  quickly  approach  15  as  the  number  of  creatures 
is  increased. 


4.1  Analysis 

Complexity  results  for  the  sectoring  approach  can  be  derived  from  a  simple  analytic  model. 
From  this  model  we  discover  that  if  the  domain  is  left  constant  as  the  number  of  sharks  and 
fishes  increases,  TIP  has  a  better  asymptotic  complexity  than  does  sectoring. 

Consider  a  fixed  sized  domain  where  the  number  of  sectors  Ns  is  variable,  as  are  the 
numbers  of  fish  F,  sharks  S,  and  the  simulation  time  interval  T.  There  are  three  main 
computational  costs. 

1.  Whenever  a  kill  event  is  processed,  we  recalculate  the  sector’s  next-kill-time; 

2.  Whenever  a  new  shark  comes  within  attacking  range  of  a  sector  we  compute  its  next 
attack  time  on  every  fish  presently  in  the  sector; 
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3.  Whenever  a  new  fish  enters  a  sector  we  calculate  the  minimum  attack  time  from  any 
shark  presently  able  to  attack  that  sector. 

Our  performance  analysis  looks  at  the  costs  and  frequencies  of  each  of  these  computations. 

For  the  sake  of  simplicity  assume  that  all  fish  and  sharks  are  evenly  distributed  among 
the  Ns  sectors.  First  we  consider  the  cost  and  frequency  of  the  next-kill-time  calculation. 
As  Ns  increases  the  number  of  fish  in  a  sector  decreases  as  F/Ns ,  and  the  number  of  sharks 
decreases  as  S/Ns ■  The  next-kill-time  calculation  would  seem  then  to  be  proportional  to 
FS/Ng,  however,  for  large  enough  Ns  the  calculation  involves  more  than  S/ Ns  sharks.  Any 
shark  within  attacking  range  A  of  a  sector  must  be  considered;  the  domain  within  distance 
A  of  a  sector  has  an  area  bounded  below  by  7 rA2.  The  number  of  sharks  involved  in  a  next- 
kill-time  calculation  is  therefore  asymptotically  proportional  to  S,  giving  the  calculation 
an  asymptotic  FS/Ns  complexity.  To  analyze  the  frequency  of  this  computation,  view  the 
simulation  from  a  single  shark’s  stationary  frame  of  reference.  Imagine  a  circle  of  radius  A 
drawn  around  the  shark.  Whenever  any  fish  enters  that  circle  it  is  eaten,  and  somewhere 
another  next-kill-time  calculation  occurs.  There  is  a  rate  A^  at  which  a  randomly  chosen 
fish  crosses  into  a  fixed  circle  of  radius  A;  ignoring  depletion  effects  the  ensemble  rate  at 
which  any  fish  enters  a  given  circle  is  FXA.  As  there  are  S  sharks,  the  ensemble  rate 
of  kills  (and  therefore  next-kill-time  events)  is  proportional  to  FS.  One  can  modify  this 
argument  to  include  the  effects  of  depleting  fish;  however,  the  end  complexities  are  not 
altered.  Combining  the  rate  (in  simulation  time)  of  the  next-kill-time  calculation  and  its 
cost,  we  see  that  the  computational  complexity  per  unit  simulation  time  is  asymptotically 
proportional  to  ( FS)2/Ns ■ 

The  second  type  of  computational  cost  is  suffered  whenever  a  shark  comes  within  attack 
range  of  a  sector.  The  perimeter  of  the  attack  zone  around  a  sector  is  at  least  2i rA  long; 
therefore  the  rate  at  which  sharks  cross  into  a  given  sector’s  attack  zone  is  asymptotically 
proportional  to  S  (again  a  consequence  of  the  domain  having  fixed  size).  The  calculation  is 
linear  in  the  number  of  fish  in  the  sector:  F/ Ns-  There  are  Ns  sectors  where  this  calcula¬ 
tion  occurs.  Therefore,  the  computational  complexity  per  unit  simulation  time  due  to  this 
calculation  is  asymptotically  proportional  to  (FS). 

The  third  type  of  computational  cost  is  suffered  whenever  a  fish  crosses  into  a  sector. 
One  must  compute  the  minimal  attack  time  on  that  fish  from  any  shark  able  to  attack  the 
sector.  This  cost  is  linear  in  the  number  of  sharks  attacking  the  sector,  a  number  which  is 
proportional  to  S.  The  frequency  of  this  computation  is  the  frequency  of  fish  crossing  the 
sector  boundary.  The  length  of  the  sector  perimeter  is  inversely  proportional  to  y/ Ns,  so 
the  computation  occurs  at  a  given  sector  at  a  rate  proportional  to  F/y/Ng)  collectively  it 
occurs  in  the  simulation  at  rate  Fy/Wg.  The  computational  complexity  per  unit  simulation 
time  due  to  this  calculation  is  therefore  asymptotically  proportional  to  FSy/Ns ■ 

Combining  the  costs  of  all  three  types  of  computations  we  see  that  the  overall  computa¬ 
tional  cost  per  unit  simulation  time  is  asymptotically  proportional  to  ((FS)2  j  Ns  +  FSy/Ng). 
The  most  efficient  sectoring  program  will  adapt  the  number  of  sectors  to  the  number  of  crea¬ 
tures  in  order  to  keep  the  first  term  low.  However,  in  doing  so  it  increases  the  second  term. 
The  computational  cost  per  unit  simulation  time  of  TIP  is  proportional  only  to  FS. 
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4.2  Serial  Performance 


Prior  to  engaging  in  any  parallelization  we  sought  to  determine  whether  TIP  was  in  fact  an 
efficient  solution  to  the  problem  (at  the  time  we  had  not  yet  done  the  complexity  analysis). 
The  most  straightforward  means  was  to  compare  serial  versions  of  TIP  and  a  sectoring 
simulation.  The  results  were  extremely  encouraging.  Over  a  spectrum  of  problem  sizes  the 
TIP  algorithm  computed  simulation  behaviour  over  twenty  times  faster  than  the  sectoring 
approach.  This  basic  performance  differential  remained  throughout  a  series  of  experiments 
that  sought  to  determine  the  best  sector  sizes  for  the  example  problems. 

There  are  a  whole  range  of  simulation  parameters  one  might  vary;  given  this  overly 
large  space  of  possibilities  it  seemed  to  us  that  varying  the  parameters  most  likely  to  affect 
performance  was  a  reasonable  course  of  action.  The  parameters  we  varied  have  to  do  with  the 
size  of  the  simulation:  the  numbers  of  creatures,  and  the  length  of  the  simulation  interval. 
All  other  parameters  we  left  constant,  and  at  the  values  reported  in  the  original  Sharks 
World  paper  [1].  These  values  are  given  below. 


M 

65536 

A 

50 

Velocity 

Uniformly  at  random  from  [50,200] 

Initial  X 

Uniformly  at  random 

Initial  Y 

Uniformly  at  random 

Direction 

Uniformly  at  random 

Simulation  Duration 

2000  time  units 

All  the  measurements  we  report  have  equal  numbers  of  fish  and  sharks  initially.  We  studied 
problems  with  total  creature  populations  of  32,  64,  128,  256,  512,  1024,  2048,  and  4096.  The 
table  below  gives  the  average  finishing  times  for  these  simulations  as  implemented  on  a  Sun 
Sparc  1+  workstation. 


Creatures 

Sectoring  (secs) 

TIP  (secs) 

Sectoring/TIP 

32 

1.2 

0.1 

12 

64 

3.1 

0.1 

31 

128 

8.8 

0.3 

29.3 

256 

29.2 

1.3 

22.4 

512 

107 

5 

21.4 

1024 

459 

21 

21.8 

2048 

1936 

83 

23.3 

4096 

8117 

334 

24.3 

Comparison  of  Sectoring  and  TIP  on  Sun  Sparcstation  1  + 


4.3  Parallel  Performance 

We  studied  parallel  performance  on  the  same  set  of  problems  described  above,  on  a  sixteen 
node  Intel  iPSC/2  distributed  memory  multiprocessor.  For  each  parameter  setting  we  exe¬ 
cuted  a  set  of  “short”  runs  with  T  =  2000  and  a  set  of  “long”  runs  with  T  —  100,000.  Our 
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Serial  Timings  for  Short  and  Long  Runs 


Number  of  Creatures 

Figure  2:  Timings  for  long  and  short  runs 


implementation  will  support  simulations  with  up  to  131,072  total  creatures.  However,  the 
execution  times  get  quite  long,  so  in  order  to  keep  the  serial  exection  times  within  reason 
we  limited  speedup  computations  to  runs  with  rather  smaller  numbers  of  creatures.  The 
large  problem  we  have  run  in  parallel  required  122  minutes;  for  this  problem  F  =  16386, 
S  =  16386,  and  T  =  2000. 

Figure  2  plots  timings  taken  from  the  serial  version  run  on  one  iPSC/2  node1,  and  Figure  3 
gives  the  speedups  achieved  using  sixteen  processors.  Some  experimentation  suggested  that 
we  use  a  time  slice  of  At  =  T/10.  The  value  of  “speedup”  is  not  as  rigorous  as  we  would 
like:  ideally  one  would  exhaustively  determine  the  best  time-slice  for  each  serial  run  and  use 
that  in  the  speedup  calculation.  In  fact,  we  believe  that  the  cross-over  behavior  of  the  short 
and  long  speedup  functions  is  likely  due  to  the  non-optimality  of  the  At  =  T/10  rule — in 
particular,  the  serial  timings  for  long  runs  and  few  creatures  are  probably  inflated  owing  to 
this  phenomenon.  Other  caveats  include  the  fact  that  we  did  not  include  initialization  time 
(which  matters  little  because  we  could  have  parallelized  it  had  we  spent  the  time  on  it),  nor 
do  we  include  the  10  time  required  to  report  the  fishes  final  status. 

The  parallelization  costs  which  keep  TIP  from  achieving  perfect  speedup  are  due  to  load 

‘It  is  interesting  to  note  that  there  is  apparently  a  factor  of  five  speed  differential  between  a  Sparc  1  + 
and  a  single  iPSC/2  node. 
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32  64  128  256  512  1024  2048  4096 


Number  of  Creatures 


Figure  3:  TIP  Speedups  for  long  and  short  runs 


imbalance.  Our  tricks  for  reducing  the  number  of  TIP  inner  loop  iterations  for  a  given  fish 
cause  variability  in  each  fishes  processing  time,  as  does  the  fact  that  the  forward  projection 
of  a  fish  and  shark  can  be  terminated  with  the  first  discovered  intersection.  Our  timings 
wait  for  all  processors  to  synchronize  globally,  thereby  waiting  for  the  processor  with  the 
heaviest  load  to  complete.  However,  this  degradation  will  decrease  as  the  number  of  fish 
increases,  due  to  central  limit  theorem  effects  of  reducing  the  load  variance  in  relationship 
to  the  mean. 


5  Conclusions 

The  parallelization  of  discrete-event  simulations  offers  many  challenges.  We  examined  some 
of  those  in  the  context  of  a  particular  model,  the  Sharks  World  simulation.  We  offer  two 
conclusions.  First,  knowledge  and  exploitation  of  lookahead  in  the  simulation  model  can  lead 
to  excellent  performance.  Our  search  for  lookahead  in  Sharks  World  led  us  to  a  completely 
different  solution  approach.  The  advantages  of  the  approach  are  manifold:  on  a  serial  work¬ 
station  problems  are  solved  over  twenty  times  faster  than  with  the  “usual”  discrete-event 
approach;  the  approach  is  easily  parallelized  and  achieves  high  speedups.  The  second  conclu¬ 
sion  is  that  excellent  performance  achieved  by  exploiting  lookahead  can  be  easily  thwarted 
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by  relatively  minor  changes  in  problem  specification.  Any  modification  to  the  model  rules 
that  affects  lookahead  exploitation  may  require  a  great  deal  of  modification  to  the  solution 
approach.  This  fundamental  problem  will  be  suffered  by  any  conservative  synchronization 
method  whose  performance  depends  on  lookahead.  To  the  extent  that  one  can  draw  general 
conclusions  from  this  specific  example,  we  conjecture  that  optimistic  synchronization  mecha¬ 
nisms  may  be  better  suited  than  conservative  methods  for  a  general  discrete-event  simulator; 
on  the  other  hand,  a  specific  simulation  may  have  very  good  lookahead  properties  that  can 
be  efficiently  exploited  by  a  conservative  mechanism. 
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